Goto

Collaborating Authors

 real-world environment


Off-Policy Evaluation via Off-Policy Classification

Neural Information Processing Systems

In this work, we consider the problem of model selection for deep reinforcement learning (RL) in real-world environments. Typically, the performance of deep RL algorithms is evaluated via on-policy interactions with the target environment. However, comparing models in a real-world environment for the purposes of early stopping or hyperparameter tuning is costly and often practically infeasible. This leads us to examine off-policy policy evaluation (OPE) in such settings. We focus on OPE of value-based methods, which are of particular interest in deep RL with applications like robotics, where off-policy algorithms based on Q-function estimation can often attain better sample complexity than direct policy optimization. Furthermore, existing OPE metrics either rely on a model of the environment, or the use of importance sampling (IS) to correct for the data being off-policy.


Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

Neural Information Processing Systems

In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment.However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment.In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment.To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach.Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.


MOVE: A Simple Motion-Based Data Collection Paradigm for Spatial Generalization in Robotic Manipulation

Wang, Huanqian, Chen, Chi Bene, Yue, Yang, Tao, Danhua, Guo, Tong, Xie, Shaoxuan, Huang, Denghang, Song, Shiji, Yao, Guocai, Huang, Gao

arXiv.org Artificial Intelligence

Imitation learning method has shown immense promise for robotic manipulation, yet its practical deployment is fundamentally constrained by the data scarcity. Despite prior work on collecting large-scale datasets, there still remains a significant gap to robust spatial generalization. We identify a key limitation: individual trajectories, regardless of their length, are typically collected from a \emph{single, static spatial configuration} of the environment. This includes fixed object and target spatial positions as well as unchanging camera viewpoints, which significantly restricts the diversity of spatial information available for learning. To address this critical bottleneck in data efficiency, we propose \textbf{MOtion-Based Variability Enhancement} (\emph{MOVE}), a simple yet effective data collection paradigm that enables the acquisition of richer spatial information from dynamic demonstrations. Our core contribution is an augmentation strategy that injects motion into any movable objects within the environment for each demonstration. This process implicitly generates a dense and diverse set of spatial configurations within a single trajectory. We conduct extensive experiments in both simulation and real-world environments to validate our approach. For example, in simulation tasks requiring strong spatial generalization, \emph{MOVE} achieves an average success rate of 39.1\%, a 76.1\% relative improvement over the static data collection paradigm (22.2\%), and yields up to 2--5$\times$ gains in data efficiency on certain tasks. Our code is available at https://github.com/lucywang720/MOVE.


Reinforcement Learning for Robotic Safe Control with Force Sensing

Lin, Nan, Zhang, Linrui, Chen, Yuxuan, Chen, Zhenrui, Zhu, Yujun, Chen, Ruoxi, Wu, Peichen, Chen, Xiaoping

arXiv.org Artificial Intelligence

-- For the task with complicated manipulation in unstructured environments, traditional hand-coded methods are ineffective, while reinforcement learning can provide more general and useful policy. Although the reinforcement learning is able to obtain impressive results, its stability and reliability is hard to guarantee, which would cause the potential safety threats. Besides, the transfer from simulation to real-world also will lead in unpredictable situations. T o enhance the safety and reliability of robots, we introduce the force and haptic perception into reinforcement learning. We demonstrate that the force-based reinforcement learning method can be more adaptive to environment, especially in sim-to-real transfer . Experimental results show in object pushing task, our strategy is safer and more efficient in both simulation and real world, thus it holds prospects for a wide variety of robotic applications.



VLM-driven Skill Selection for Robotic Assembly Tasks

Kim, Jeong-Jung, Koh, Doo-Yeol, Kim, Chang-Hyun

arXiv.org Artificial Intelligence

Robotic assembly tasks represent one of the most challenging problems in robotics, requiring precise manipulation capabilities combined with sophisticated reasoning about complex multi-step processes. Unlike simple pick-and-place tasks, assembly tasks demand long-term planning that spans multiple sequential actions, where each step must be carefully coordinated with previous and subsequent operations. Furthermore, these tasks require physical understanding of component interactions and spatial relationships between parts [1], [2], [3]. Vision-Language Models (VLMs) have emerged as powerful tools that bridge visual perception and high-level reasoning, offering significant advantages for robotic applications. These models excel at processing visual information while understanding natural language instructions, making them well-suited for complex manipulation tasks.


Online Object-Level Semantic Mapping for Quadrupeds in Real-World Environments

Razavi, Emad, Bratta, Angelo, Soares, João Carlos Virgolino, Recchiuto, Carmine, Semini, Claudio

arXiv.org Artificial Intelligence

Abstract--We present an online semantic object mapping system for a quadruped robot operating in real indoor environments, turning sensor detections into named objects in a global map. During a run, the mapper integrates range geometry with camera detections, merges co-located detections within a frame, and associates repeated detections into persistent object instances across frames. Objects remain in the map when they are out of view, and repeated sightings update the same instance rather than creating duplicates. The output is a compact object layer that can be queried (class, pose, and confidence), is integrated with the occupancy map and readable by a planner . In on-robot tests, the layer remained stable across viewpoint changes.


DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Zhang, Shaolei, Fan, Ju, Fan, Meihao, Li, Guoliang, Du, Xiaoyong

arXiv.org Artificial Intelligence

Autonomous data science, from raw data sources to analyst-grade deep research reports, has been a long-standing challenge, and is now becoming feasible with the emergence of powerful large language models (LLMs). Recent workflow-based data agents have shown promising results on specific data tasks but remain fundamentally limited in achieving fully autonomous data science due to their reliance on predefined workflows. In this paper, we introduce DeepAnalyze-8B, the first agentic LLM designed for autonomous data science, capable of automatically completing the end-toend pipeline from data sources to analyst-grade deep research reports. To tackle high-complexity data science tasks, we propose a curriculum-based agentic training paradigm that emulates the learning trajectory of human data scientists, enabling LLMs to progressively acquire and integrate multiple capabilities in real-world environments. We also introduce a data-grounded trajectory synthesis framework that constructs high-quality training data. Through agentic training, DeepAnalyze learns to perform a broad spectrum of data tasks, ranging from data question answering and specialized analytical tasks to open-ended data research. Experiments demonstrate that, with only 8B parameters, DeepAnalyze outperforms previous workflow-based agents built on most advanced proprietary LLMs. The model, code, and training data of DeepAnalyze are open-sourced, paving the way toward autonomous data science.


Toward Ownership Understanding of Objects: Active Question Generation with Large Language Model and Probabilistic Generative Model

Hashimoto, Saki, Hasegawa, Shoichi, Ishikawa, Tomochika, Taniguchi, Akira, Hagiwara, Yoshinobu, Hafi, Lotfi El, Taniguchi, Tadahiro

arXiv.org Artificial Intelligence

Robots operating in daily life environments must understand object ownership to carry out instructions naturally given by users, such as "Bring me my cup." Without ownership knowledge, a robot cannot determine which object is being referred to when multiple similar objects exist. This problem is especially evident in kitchens, offices, or laboratories, where objects with similar appearances may belong to different individuals. Relying solely on perceptual features such as location or appearance is insufficient because ownership is inherently context-dependent and often determined by social conventions. Therefore, enabling robots to acquire ownership knowledge is a crucial step toward socially appropriate human-robot interaction. To enable robots to learn object ownership in daily life environments, it is essential to implement a question-generation mechanism that efficiently acquires necessary information. However, in real-world environments with large numbers of objects, this is impractical and imposes a heavy burden on users. Although robots can explore the environment to collect visual features of objects, it remains difficult to obtain ownership knowledge because it depends on users and context. Therefore, allowing robots to ask questions based on the current situation enables them to acquire ownership knowl-Saki Hashimoto is the presenter of this paper.